Binomial distribution

Probability mass function
Cumulative distribution function
notation:	B(n, p)
parameters:	n ∈ N₀ — number of trials p ∈ [0,1] — success probability in each trial
support:	k ∈ { 0, …, n }
pmf:	$\textstyle {n \choose k}\, p^k (1-p)^{n-k}$
cdf:	$\textstyle I_{1-p}(n - k, 1 + k)$
mean:	np
median:	⌊np⌋ or ⌈np⌉
mode:	⌊(n + 1)p⌋ or ⌊(n + 1)p⌋ − 1
variance:	np(1 − p)
skewness:	$\frac{1-2p}{\sqrt{np(1-p)}}$
ex.kurtosis:	$\frac{1-6p(1-p)}{np(1-p)}$
entropy:	$\frac12 \log_2 \big( 2\pi e\, np(1-p) \big) + O \left( \frac{1}{n} \right)$
mgf:	$(1-p + pe^t)^n \!$
cf:	$(1-p + pe^{it})^n \!$

In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. Such a success/failure experiment is also called a Bernoulli experiment or Bernoulli trial. In fact, when n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

It is frequently used to model number of successes in a sample of size n from a population of size N. Since the samples are not independent (this is sampling without replacement), the resulting distribution is a hypergeometric distribution, not a binomial one. However, for N much larger than n, the binomial distribution is a good approximation, and widely used.

Examples

An elementary example is this: roll a standard die ten times and count the number of fours. The distribution of this random number is a binomial distribution with n = 10 and p = 1/6.

As another example, flip a coin three times and count the number of heads. The distribution of this random number is a binomial distribution with n = 3 and p = 1/2.

Specification

Probability mass function

In general, if the random variable K follows the binomial distribution with parameters n and p, we write K ~ B(n, p). The probability of getting exactly k successes in n trials is given by the probability mass function:

$f(k;n,p) = \Pr(K = k) = {n\choose k}p^k(1-p)^{n-k}$

for k = 0, 1, 2, ..., n, where

${n\choose k}=\frac{n!}{k!(n-k)!}$

is the binomial coefficient (hence the name of the distribution) "n choose k", also denoted C(n, k), _nC_k, or ⁿC_k. The formula can be understood as follows: we want k successes (p^k) and n − k failures (1 − p)^n − k. However, the k successes can occur anywhere among the n trials, and there are C(n, k) different ways of distributing k successes in a sequence of n trials.

In creating reference tables for binomial distribution probability, usually the table is filled in up to n/2 values. This is because for k > n/2, the probability can be calculated by its complement as

$f(k;n,p)=f(n-k;n,1-p). \,$

So, one must look to a different k and a different p (the binomial is not symmetrical in general). However, its behavior is not arbitrary. There is always an integer m that satisfies

$(n+1)p-1 < m \leq (n+1)p. \,$

As a function of k, the expression ƒ(k; n, p) is monotone increasing for k < m and monotone decreasing for k > m, with the exception of one case where (n + 1)p is an integer. In this case, there are two maximum values for m = (n + 1)p and m − 1. m is known as the most probable (most likely) outcome of Bernoulli trials. Note that the probability of it occurring can be fairly small.

Cumulative distribution function

The cumulative distribution function can be expressed as:

$F(x;n,p) = \Pr(X \le x) = \sum_{i=0}^{\lfloor x \rfloor} {n\choose i}p^i(1-p)^{n-i}.$

where $\scriptstyle \lfloor x\rfloor\,$ is the "floor" under x, i.e. the greatest integer less than or equal to x.

It can also be represented in terms of the regularized incomplete beta function, as follows:

$\begin{align} F(k;n,p) & = \Pr(X \le k) = I_{1-p}(n-k, k+1) \\ & = (n-k) {n \choose k} \int_0^{1-p} t^{n-k-1} (1-t)^k \, dt. \end{align}$

For k ≤ np, upper bounds for the lower tail of the distribution function can be derived. In particular, Hoeffding's inequality yields the bound

$F(k;n,p) \leq \exp\left(-2 \frac{(np-k)^2}{n}\right), \!$

and Chernoff's inequality can be used to derive the bound

$F(k;n,p) \leq \exp\left(-\frac{1}{2\,p} \frac{(np-k)^2}{n}\right). \!$

Moreover, these bounds are reasonably tight when p = 1/2, since the following expression holds for all k ≥ 3n/8^[1]

$F(k;n,1/2) \geq \frac{1}{15} \exp\left(- \frac{16 (n/2 - k)^2}{n}\right). \!$

Mean and variance

If X ~ B(n, p) (that is, X is a binomially distributed random variable), then the expected value of X is

$\operatorname{E}[X] = np$

and the variance is

$\operatorname{Var}[X] = np(1 - p).$

This fact is easily proven as follows. Suppose first that we have a single Bernoulli trial. There are two possible outcomes: 1 and 0, the first occurring with probability p and the second having probability 1 − p. The expected value in this trial will be equal to μ = 1 · p + 0 · (1−p) = p. The variance in this trial is calculated similarly: σ² = (1−p)²·p + (0−p)²·(1−p) = p(1 − p).

The generic binomial distribution is a sum of n independent Bernoulli trials. The mean and the variance of such distributions are equal to the sums of means and variances of each individual trial:

$\mu_n = \sum_{k=1}^n \mu = np, \qquad \sigma^2_n = \sum_{k=1}^n \sigma^2 = np(1 - p).$

Mode and median

Usually the mode of a binomial B(n, p) distribution is equal to ⌊(n + 1)p⌋, where ⌊ ⌋ is the floor function. However when (n + 1)p is an integer and p is neither 0 nor 1, then the distribution has two modes: (n + 1)p and (n + 1)p − 1. When p is equal to 0 or 1, the mode will be 0 and n correspondingly. These cases can be summarized as follows:

$\text{mode} = \begin{cases} \lfloor (n+1)\,p\rfloor & \text{if }(n+1)p\text{ is 0 or a noninteger}, \\ (n+1)\,p\ \text{ and }\ (n+1)\,p - 1 &\text{if }(n+1)p\in\{1,\dots,n\}, \\ n & \text{if }(n+1)p = n + 1. \end{cases}$

In general, there is no single formula to find the median for a binomial distribution, and it may even be non-unique. However several special results have been established:

If np is an integer, then the mean, median, and mode coincide.^[2]
Any median m must lie within the interval ⌊np⌋ ≤ m ≤ ⌈np⌉.^[3]
A median m cannot lie too far away from the mean: |m − np| ≤ min{ ln 2, max{p, 1 − p} }.^[4]
The median is unique and equal to m = round(np) in cases when either p ≤ 1 − ln 2 or p ≥ ln 2 or |m − np| ≤ min{p, 1 − p} (except for the case when p = ½ and n is odd).^[3]^[4]
When p = 1/2 and n is odd, any number m in the interval ½(n − 1) ≤ m ≤ ½(n + 1) is a median of the binomial distribution. If p = 1/2 and n is even, then m = n/2 is the unique median.

Covariance between two binomials

If two binomially distributed random variables X and Y are observed together, estimating their covariance can be useful. Using the definition of covariance, in the case n = 1 we have

$\operatorname{Cov}(X, Y) = \operatorname{E}(XY) - \mu_X \mu_Y.$

The first term is non-zero only when both X and Y are one, and μ_X and μ_Y are equal to the two probabilities. Defining p_B as the probability of both happening at the same time, this gives

$\operatorname{Cov}(X, Y) = p_B - p_X p_Y, \,$

and for n such trials again due to independence

$\operatorname{Cov}(X, Y)_n = n ( p_B - p_X p_Y ). \,$

If X and Y are the same variable, this reduces to the variance formula given above.

Algebraic derivations of mean and variance

We derive these quantities from first principles. Certain particular sums occur in these two derivations. We rearrange the sums and terms so that sums solely over complete binomial probability mass functions (pmf) arise, which are always unity

$\sum_{k=0}^n \operatorname{Pr}(X=k) = \sum_{k=0}^n {n\choose k}p^k(1-p)^{n-k} = 1.$

We apply the definition of the expected value of a discrete random variable to the binomial distribution

$\operatorname{E}(X) = \sum_k x_k \cdot \operatorname{Pr}(x_k) = \sum_{k=0}^n k \cdot \operatorname{Pr}(X=k) = \sum_{k=0}^n k \cdot {n\choose k}p^k(1-p)^{n-k}.$

The first term of the series (with index k = 0) has value 0 since the first factor, k, is zero. It may thus be discarded, i.e. we can change the lower limit to: k = 1

$\operatorname{E}(X) = \sum_{k=1}^n k \cdot \frac{n!}{k!(n-k)!} p^k(1-p)^{n-k} = \sum_{k=1}^n k \cdot \frac{n\cdot(n-1)!}{k\cdot(k-1)!(n-k)!} \cdot p \cdot p^{k-1}(1-p)^{n-k}.$

We've pulled factors of n and k out of the factorials, and one power of p has been split off. We are preparing to redefine the indices.

$\operatorname{E}(X) = np \cdot \sum_{k=1}^n \frac{(n-1)!}{(k-1)!(n-k)!} p^{k-1}(1-p)^{n-k}$

We rename m = n − 1 and s = k − 1. The value of the sum is not changed by this, but it now becomes readily recognizable

$\operatorname{E}(X) = np \cdot \sum_{s=0}^m \frac{m!}{s!(m-s)!} p^s(1-p)^{m-s} = np \cdot \sum_{s=0}^m {m\choose s} p^s(1-p)^{m-s}.$

The ensuing sum is a sum over a complete binomial pmf (of one order lower than the initial sum, as it happens). Thus

$\operatorname{E}(X) = np \cdot 1 = np.$

^[5]

Variance

It can be shown that the variance is equal to (see: Computational formula for the variance):

$\operatorname{Var}(X) = \operatorname{E}(X^2) - (\operatorname{E}(X))^2.$

In using this formula we see that we now also need the expected value of X²:

$\operatorname{E}(X^2) = \sum_{k=0}^n k^2 \cdot \operatorname{Pr}(X=k) = \sum_{k=0}^n k^2 \cdot {n\choose k}p^k(1-p)^{n-k}.$

We can use our experience gained above in deriving the mean. We know how to process one factor of k. This gets us as far as

$\operatorname{E}(X^2) = np \cdot \sum_{s=0}^m k \cdot {m\choose s} p^s(1-p)^{m-s} = np \cdot \sum_{s=0}^m (s+1) \cdot {m\choose s} p^s(1-p)^{m-s}$

(again, with m = n − 1 and s = k − 1). We split the sum into two separate sums and we recognize each one

$\operatorname{E}(X^2) = np \cdot \bigg( \sum_{s=0}^m s \cdot {m\choose s} p^s(1-p)^{m-s} + \sum_{s=0}^m 1 \cdot {m\choose s} p^s(1-p)^{m-s} \bigg).$

The first sum is identical in form to the one we calculated in the Mean (above). It sums to mp. The second sum is unity.

$\operatorname{E}(X^2) = np \cdot ( mp + 1) = np((n-1)p + 1) = np(np - p + 1).$

Using this result in the expression for the variance, along with the Mean (E(X) = np), we get

$\operatorname{Var}(X) = \operatorname{E}(X^2) - (\operatorname{E}(X))^2 = np(np - p + 1) - (np)^2 = np(1-p).$

Using falling factorials to find E(X²)

We have

$\operatorname{E}(X^2) = \sum_{k=0}^n k^2 \cdot \operatorname{Pr}(X=k) = \sum_{k=0}^n k^2 \cdot {n\choose k}p^k(1-p)^{n-k}.$

But

$k^2= k(k - 1) + k.\,$

$\begin{align} \operatorname{E}(X^2) & = \sum_{k=0}^n (k(k - 1)+ k) \cdot {n\choose k}p^k(1-p)^{n-k} \\ & = \sum_{k=0}^n k ( k - 1 ) {n\choose k}p^k(1-p)^{n-k} + \sum_{k=0}^n k {n\choose k}p^k(1-p)^{n-k} \\ & = \sum_{k=2}^n k ( k - 1 ) {n\choose k}p^k(1-p)^{n-k} + \sum_{k=1}^n k {n\choose k}p^k(1-p)^{n-k} \\ & = \sum_{k=2}^n n ( n - 1 ) {n -2\choose k - 2}p^k(1-p)^{n-k} + \sum_{k=1}^n n {n - 1 \choose k - 1} p^k (1-p)^{n-k} \\ & = \sum_{k=0}^{n-2} n ( n - 1 ) {n -2\choose k}p^{k+2}(1-p)^{(n-2)-k} + \sum_{k=0}^{n-1} n {n - 1 \choose k} p^{k+1} (1-p)^{(n-1)-k} \\ & = n(n-1)p^2 \underbrace{\sum_{k=0}^{n-2} {n - 2 \choose k} p^k (1 - p)^{(n-2)-k}}_{= 1} + np \underbrace{ \sum_{k=0}^{n-1} {n - 1 \choose k} p^k (1-p)^{(n-1)-k}}_{=1} \\ & = n(n-1)p^2 + np \\ & = n^2p^2 - np^2 + np. \end{align}$

Thus

$\operatorname{Var}(X) = \operatorname{E}(X^2) - (\operatorname{E}(X))^2 = (n^2p^2 - np^2 + np) - n^2p^2 = np(1 - p).$

Relationship to other distributions

Sums of binomials

If X ~ B(n, p) and Y ~ B(m, p) are independent binomial variables, then X + Y is again a binomial variable; its distribution is

$X+Y \sim B(n+m, p).\,$

Bernoulli distribution

The Bernoulli distribution is a special case of the binomial distribution, where n = 1. Symbolically, X ~ B(1, p) has the same meaning as X ~ Bern(p). Conversely, any binomial distribution, B(n, p), is the sum of n independent Bernoulli trials, Bern(p), each with the same probability p.

Normal approximation

Binomial PDF and normal approximation for n = 6 and p = 0.5

If n is large enough, then the skew of the distribution is not too great. In this case, if a suitable continuity correction is used, then an excellent approximation to B(n, p) is given by the normal distribution

$\mathcal{N}(np,\, np(1-p)).$

The approximation generally improves as n increases (at least 20) and is better when p is not near to 0 or 1.^[6] Various rules of thumb may be used to decide whether n is large enough, and p is far enough from the extremes of zero or unity:

One rule is that both x=np and n(1 − p) must be greater than 5. However, the specific number varies from source to source, and depends on how good an approximation one wants; some sources give 10 which gives virtually the same results as the following rule for large n until n is very large (ex: x=11, n=7752).

That rule^[6] is that for $n>5$ the normal approximation is adequate if

$|(1/\sqrt{n})(\sqrt{(1-p)/p}-\sqrt{p/(1-p)})|<0.3$

Another commonly used rule holds that the normal approximation is appropriate only if everything within 3 standard deviations of its mean is within the range of possible values, that is if

$\mu \pm 3 \sigma = np \pm 3 \sqrt{np(1-p)} \in [0,n]. \,$

Also as the approximation generally improves, it can be shown that the inflection points occur at

$np \pm \sqrt{np(1-p)} \,$

The following is an example of applying a continuity correction: Suppose one wishes to calculate Pr(X ≤ 8) for a binomial random variable X. If Y has a distribution given by the normal approximation, then Pr(X ≤ 8) is approximated by Pr(Y ≤ 8.5). The addition of 0.5 is the continuity correction; the uncorrected normal approximation gives considerably less accurate results.

This approximation, known as de Moivre–Laplace theorem, is a huge time-saver (exact calculations with large n are very onerous); historically, it was the first use of the normal distribution, introduced in Abraham de Moivre's book The Doctrine of Chances in 1738. Nowadays, it can be seen as a consequence of the central limit theorem since B(n, p) is a sum of n independent, identically distributed Bernoulli variables with parameter p. This fact is the basis of a hypothesis test, a "proportion z-test," for the value of p using x/n, the sample proportion and estimator of p, in a common test statistic.^[7]

For example, suppose you randomly sample n people out of a large population and ask them whether they agree with a certain statement. The proportion of people who agree will of course depend on the sample. If you sampled groups of n people repeatedly and truly randomly, the proportions would follow an approximate normal distribution with mean equal to the true proportion p of agreement in the population and with standard deviation σ = (p(1 − p)/n)^1/2. Large sample sizes n are good because the standard deviation, as a proportion of the expected value, gets smaller, which allows a more precise estimate of the unknown parameter p.

Poisson approximation

The binomial distribution converges towards the Poisson distribution as the number of trials goes to infinity while the product np remains fixed. Therefore the Poisson distribution with parameter λ = np can be used as an approximation to B(n, p) of the binomial distribution if n is sufficiently large and p is sufficiently small. According to two rules of thumb, this approximation is good if n ≥ 20 and p ≤ 0.05, or if n ≥ 100 and np ≤ 10.^[8]

Limits

As n approaches ∞ and p approaches 0 while np remains fixed at λ > 0 or at least np approaches λ > 0, then the Binomial(n, p) distribution approaches the Poisson distribution with expected value λ.

As n approaches ∞ while p remains fixed, the distribution of

${X-np \over \sqrt{np(1-p)\ }}$

approaches the normal distribution with expected value 0 and variance 1 (this is just a specific case of the Central Limit Theorem).

Generating binomial random variates

Luc Devroye, Non-Uniform Random Variate Generation, New York: Springer-Verlag, 1986. See especially Chapter X, Discrete Univariate Distributions.
doi:10.1145/42372.42381
This citation will be automatically completed in the next few minutes. You can jump the queue or expand by hand

References

↑ Matousek, J, Vondrak, J: The Probabilistic Method (lecture notes) [1].
↑ Neumann, P. (1966). "Über den Median der Binomial- and Poissonverteilung" (in German). Wissenschaftliche Zeitschrift der Technischen Universität Dresden 19: 29–33.
↑ ^3.0 ^3.1 Kaas, R.; Buhrman, J.M. (1980). "Mean, Median and Mode in Binomial Distributions". Statistica Neerlandica 34 (1): 13–18. doi:10.1111/j.1467-9574.1980.tb00681.x.
↑ ^4.0 ^4.1 doi:10.1016/0167-7152(94)00090-U
This citation will be automatically completed in the next few minutes. You can jump the queue or expand by hand
↑ Morse, Philip (1969). Thermal Physics. New York: W. A. Benjamin. ISBN 0805372024.
↑ ^6.0 ^6.1 Box, Hunter and Hunter (1978). Statistics for experimenters. Wiley. p. 130.
↑ NIST/SEMATECH, "7.2.4. Does the proportion of defectives meet requirements?" e-Handbook of Statistical Methods.
↑ NIST/SEMATECH, "6.3.3.1. Counts Control Charts", e-Handbook of Statistical Methods.

External links

Web Based Binomial Probability Distribution Calculator (does not require java)
Binomial Probabilities Simple Explanation
SOCR Binomial Distribution Applet
CAUSEweb.org Many resources for teaching Statistics including Binomial Distribution
"Binomial Distribution" by Chris Boucher, Wolfram Demonstrations Project, 2007.
Binomial Distribution Properties and Java simulation from cut-the-knot
Statistics Tutorial: Binomial Distribution

Probability distributions

Discrete univariate with finite support

Benford · Bernoulli · Beta-binomial · binomial · categorical · hypergeometric · Poisson binomial · Rademacher · discrete uniform · Zipf · Zipf-Mandelbrot

Discrete univariate with infinite support

Boltzmann · Conway–Maxwell–Poisson · discrete phase-type · extended negative binomial · Gauss–Kuzmin · geometric · logarithmic · negative binomial · parabolic fractal · Poisson · Skellam · Yule–Simon · zeta

Continuous univariate supported on a bounded interval, e.g. [0,1]

Beta · Irwin–Hall · Kumaraswamy · logit-normal · raised cosine · triangular · U-quadratic · uniform · Wigner semicircle

Continuous univariate supported on a semi-infinite interval, usually [0,∞)

Beta prime · Bose–Einstein · Burr · chi-square · chi · Coxian · Erlang · exponential · F · Fermi–Dirac · folded normal · Fréchet · Gamma · generalized extreme value · generalized inverse Gaussian · half-logistic · half-normal · Hotelling's T-square · hyper-exponential · hypoexponential · inverse chi-square (scaled inverse chi-square) · inverse Gaussian · inverse gamma · Lévy · log-normal · log-logistic · Maxwell–Boltzmann · Maxwell speed · Nakagami · noncentral chi-square · Pareto · phase-type · Rayleigh · relativistic Breit–Wigner · Rice · Rosin–Rammler · shifted Gompertz · truncated normal · type-2 Gumbel · Weibull · Wilks' lambda

Continuous univariate supported on the whole real line (−∞, ∞)

Cauchy · extreme value · exponential power · Fisher's z · generalized normal · generalized hyperbolic · Gumbel · hyperbolic secant · Landau · Laplace · logistic · noncentral t · normal (Gaussian) · normal-inverse Gaussian · skew normal · slash · stable · Student's t · type-1 Gumbel · Variance-Gamma · Voigt

Multivariate (joint)

Discrete: Ewens · multinomial · multivariate Polya · negative multinomial Continuous: Dirichlet · Generalized Dirichlet · multivariate normal · multivariate Student · normal-scaled inverse gamma · normal-gamma Matrix-valued: inverse-Wishart · matrix normal · Wishart

Directional, degenerate, and singular

Directional:Circular Uniform · bivariate von Mises · Kent · univariate von Mises · von Mises–Fisher · Wrapped normal · Wrapped Cauchy · Wrapped Lévy Degenerate: discrete degenerate · Dirac delta function Singular: Cantor

Families

Circular · compound Poisson · elliptical · exponential · natural exponential · location-scale · maximum entropy · mixture · Pearson · Tweedie

Some common univariate probability distributions

Continuous	beta • Cauchy • chi-square • exponential • F • gamma • Laplace • log-normal • normal • Pareto • Student's t • uniform • Weibull

Discrete	Bernoulli • binomial • discrete uniform • geometric • hypergeometric • negative binomial • Poisson

List of probability distributions

Binomial distribution

Contents

Examples